Search CORE

16 research outputs found

Recommended from our members

Parallels in the sequential organization of birdsong and human speech.

Author: Gentner Timothy Q
Sainburg Tim
Theilman Brad
Thielk Marvin
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes

eScholarship - University of California

Parametric UMAP embeddings for representation and semi-supervised learning

Author: Gentner Timothy Q
McInnes Leland
Sainburg Tim
Publication venue
Publication date: 02/04/2021
Field of study

UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that Parametric UMAP performs comparably to its non-parametric counterpart while conferring the benefit of a learned parametric mapping (e.g. fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semi-supervised learning by capturing structure in unlabeled data. Google Colab walkthrough: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharin

arXiv.org e-Print Archive

eScholarship - University of California

A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations

Author: Averly Baptiste
Demartsev Vlad
Jensen Frants H.
Manser Marta B.
Roch Marie
Sainburg Tim
Strandburg-Peshkin Ariana
Thomas Mara
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

© The Author(s), 2022. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Thomas, M., Jensen, F. H., Averly, B., Demartsev, V., Manser, M. B., Sainburg, T., Roch, M. A., & Strandburg-Peshkin, A. A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations. The Journal of Animal Ecology, 91(8), (2022): 1567– 1581, https://doi.org/10.1111/1365-2656.13754.1. Background: The manual detection, analysis and classification of animal vocalizations in acoustic recordings is laborious and requires expert knowledge. Hence, there is a need for objective, generalizable methods that detect underlying patterns in these data, categorize sounds into distinct groups and quantify similarities between them. Among all computational methods that have been proposed to accomplish this, neighbourhood-based dimensionality reduction of spectrograms to produce a latent space representation of calls stands out for its conceptual simplicity and effectiveness. 2. Goal of the study/what was done: Using a dataset of manually annotated meerkat Suricata suricatta vocalizations, we demonstrate how this method can be used to obtain meaningful latent space representations that reflect the established taxonomy of call types. We analyse strengths and weaknesses of the proposed approach, give recommendations for its usage and show application examples, such as the classification of ambiguous calls and the detection of mislabelled calls. 3. What this means: All analyses are accompanied by example code to help researchers realize the potential of this method for the study of animal vocalizations.This work was supported by HFSP Research Grant RGP0051/2019 to ASP, MBM and MAR, and funded by the Deutsche Forschungsgemeinschaft (DFG) under Germany's Excellence Strategy (EXC-2117-422037984). ASP received additional funding from the Gips-Schüle Stiftung, the Zukunftskolleg at the University of Konstanz and the Max-Planck-Institute of Animal Behaviour. VD was funded by the Minerva Stiftung and Alexander von Humboldt Foundation

KOPS - The Institutional Repository of the University of Konstanz

Woods Hole Open Access Server

MPG.PuRe

Parametric UMAP Embeddings for Representation and Semisupervised Learning.

Author: Sainburg Tim,
Publication venue
Publication date: 24/06/2023
Field of study

Ezid

American postdoctoral salaries do not account for growing disparities in cost of living

Author: Sainburg Tim
Publication venue
Publication date: 25/05/2022
Field of study

The National Institute of Health (NIH) sets postdoctoral (postdoc) trainee stipend levels that many American institutions and investigators use as a basis for postdoc salaries. Although salary standards are held constant across universities, the cost of living in those universities' cities and towns vary widely. Across non-postdoc jobs, more expensive cities pay workers higher wages that scale with an increased cost of living. This work investigates the extent to which postdoc wages account for cost-of-living differences. More than 27,000 postdoc salaries across all US universities are analyzed alongside measures of regional differences in cost of living. We find that postdoc salaries do not account for cost-of-living differences, in contrast with the broader labor market in the same cities and towns. Despite a modest increase in income in high cost of living areas, real (cost of living adjusted) postdoc salaries differ by 29% ($15k 2021 USD) between the least and most expensive areas. Cities that produce greater numbers of tenure-track faculty relative to students such as Boston, New York, and San Francisco are among the most impacted by this pay disparity. The postdoc pay gap is growing and is well-positioned to incur a greater financial burden on economically disadvantaged groups and contribute to faculty hiring disparities in women and racial minorities

arXiv.org e-Print Archive

Parallels in the sequential organization of birdsong and human speech

Author: Sainburg Tim,
Publication venue
Publication date: 20/05/2020
Field of study

Ezid

Recommended from our members

Temporal organization in vocal communication: sequential structure, perceptual integration, and neural foundations

Author: Sainburg Tim
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

Our interactions with the world unfold over time. Whether it's speaking, where one word follows the next, or walking, where each step follows another, the organization of our behaviors in time tends to follow a predictable pattern. Those patterns are dictated by a multitude of underlying factors, influenced both by endogenous physiological factors like the rhythmic nature of our gait as well as by exogenous factors, like the social dynamics underlying turn-taking while speaking. Despite decades of research studying the temporal organization of behavior, dating back to the work of influential biologists like Tinbergen, Lashley, and Dawkins, little is known about the physiological substrates that underlie either the production of the sequential organization of most aspects of behavior. Despite widespread acknowledgment that physiological motor programs and many non-linguistic behaviors are hierarchical, for example, few physiological investigations into the dynamics of behavior extend beyond low-order (Markovian) transition statistics. In this thesis, I build onto the emerging field of computational neuroethology to further our understanding of what structure underlies the sequential organization of behavior, what physiological mechanisms might be involved in producing, perceiving, and representing sequential behavioral organization, and how sequential behavioral organization might have emerged developmentally and evolutionarily. Throughout the thesis, I draw primarily upon birdsong and human speech, developing methods to analyze the acoustic and temporal structure in vocal signals and then behaviorally and physiologically probing the underpinnings of sequential organization in the songbird. This work advances the field of computational neuroethology in several ways.I uncover novel acoustic structure in vocal signals separating avian and mammalian vocalizations along a spectrum of vocal stereotypy. I observe that both human speech and birdsong are characterized by a combination of long and short-range temporal patterning. I find that the long-range temporal patterning characterizing human speech, believed to be underlied by hierarchical linguistic organization, is present at the earliest developmental stages of human speech, well before complex syntax is produced. I find that the perceptual integration of birdsong syllable sequences can be well explained by Bayesian models of probabilistic perceptual decision-making. Finally, I find that sensory neural representations of syllable sequences are modulated by sequential context and that this modulation reflects the animals underlying perceptual behavior. In the following paragraphs, I give a brief overview of the methods and major results of the chapters comprising this thesis.In Chapter \ref{chapter:review} I give an introduction to the emerging field of vocal computational neuroethology. This introduction contextualizes the following chapters in a review of current work. I emphasize current tools, challenges, and future directions in vocal neuroethology. I start with a discussion of low-level bioacoustics challenges and build up to a discussion of behavioral organization and physiology. I first discuss challenges in signal processing such as dealing with noise and signals and representing vocal signals as time-frequency representations. I then discuss machine learning approaches used to identify, segment, and label vocalizations. Next, I discuss how to extract relational structure between vocalizations, and cluster latent projections of vocalizations. I then give an overview of methods for capturing temporal relationships in vocal sequences, outlining traditional Markovian descriptions of vocal structure, and new tools for capturing long-range structure, enabled by large datasets. I then move on to machine learning tools that can be used to systematically control and synthesize vocal signals from learned vocal spaces. Finally, I discuss how these techniques are being utilized in several active areas of neuroethology research. In Chapter \ref{chapter:avgn} I develop a set of methods to visualize and quantify relational structure in vocalizations, which enable the analyses and experiments performed in the following chapters. I use graph-based dimensionality reduction to uncover local structure in vocal communication signals and apply that technique to 19 datasets consisting of vocalizations from 29 species, including songbirds, primates, cetaceans, rodents, and bats. I observe that these methods uncover novel structure in animal vocal signals, including vocal dialects, acoustic units, behaviorally relevant signal information, and sub-syllabic structure. In Chapter \ref{chapter:parametric_umap}, I extend the methods from Chapter \ref{chapter:avgn} by introducing Parametric UMAP, a graph-based dimensionality reduction algorithm that parametrically learns the relationship between data (here vocal signals) and latent embeddings. Parametric UMAP enables the methods from Chapter \ref{chapter:avgn} to be applied in real-time closed-looped settings over larger datasets due to the learned parametric embeddings. I show that this algorithm has applications in semi-supervised settings, and provides additional control over the trade-off between capturing global and local structure in embeddings. In Chapter \ref{chapter:parallels} I explore the long and short-range temporal patterning of vocal sequences in birdsong and human speech. I use an information-theoretic framework to analyze statistical dependencies as a function of the distance between elements in vocal sequences. I find that both birdsong and human speech exhibit two forms of structure: short-range relationships captured by Markovian dynamics over short-timescales, and long-range relationships that follow a power-law occurring over longer timescales. In language, the observed short-range organization conforms to phonological processes, which are well-described by finite-state dynamics, while long-range organization suggests more complex dynamics such as underlying hierarchical organization. Previous analyses of birdsong have only identified short-range Markovian dynamics, making our observation of long-range dynamics in birdsong novel. In Chapter \ref{chapter:lri} I extend our experiment from chapter \ref{chapter:parallels} over human speech to language acquisition. By analyzing corpora of speech throughout language development, we can observe the time course of the emergence of long and short-range relationships over development. Surprisingly, I find that long-range statistical dependencies are present in children's speech as early as 6-12 months, well before complex syntactic structure is present. I discuss these results alongside emerging evidence from computational ethology that long-range relationships are also common to non-linguistic behavioral signals from animals as diverse as zebrafish, drosophila, and whales. Although previous analyses of long-range relationships have suggested that long-range relationships are the product of hierarchical linguistic structure such as syntax and discourse structure, our observations in developmental speech and non-linguistic behaviors suggest that other mechanisms may also be at play. Finally, in Chapter \ref{chapter:cdcp} I probe how sequential dependencies in vocal sequences are integrated behaviorally and physiologically. I developed a behavioral task in which European starlings are trained to classify morphs of syllables of starling song synthesized from an interpolation between two points in the latent space of a neural network (a Variational Autoencoder). These morph syllables are preceded with a separate syllable (a cue syllable), which holds predictive information about the category of the following morph syllable. I find that classification of the morph syllable is contextually modulated by the predictive probability of the cue syllable, which can be well explained by a model of Bayesian integration. With the same behavioral paradigm, I then record chronic electrophysiology data from auditory nuclei while birds performed this context-dependent categorical perceptual decision-making task. I find that neural activity patterns reflect several aspects of our model of perceptual behavior, including the uncertainty in decision making, and prediction-related perceptual modulation

eScholarship - University of California

Long-range sequential dependencies precede complex syntactic production in language acquisition

Author: Gentner Timothy
Mai Anna
Sainburg Tim
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

To convey meaning, language relies on hierarchically organized, long-range relationships spanning words, phrases, sentences, and discourse. As the distances between elements in language sequences increase, the strength of the long range relationships between those elements decays following a power law. This power-law relationship has been attributed variously to long-range sequential organization present in language syntax, semantics, and discourse structure. However, non-linguistic behaviors in numerous phylogenetically distant species, ranging from humpback whale song to fruit fly motility, demonstrate similar long-range statistical dependencies. Therefore, we hypothesized that long-range statistical dependencies in speech may occur independently of linguistic structure. To test this hypothesis, we measured long-range dependencies in speech corpora from children (aged 6 months -- 12 years). We find that adult-like power-law statistical dependencies are present in human vocalizations prior to the production of complex linguistic structure. These linguistic structures cannot, therefore, be the sole cause of long-range statistical dependencies in language

PubMed Central

eScholarship - University of California

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires.

Author: Marvin Thielk
Tim Sainburg
Timothy Q Gentner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2020
Field of study

Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species' vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication

Directory of Open Access Journals